Libris Britannia 4

home *** CD-ROM | disk | FTP | other *** search

/ Libris Britannia 4 / science library(b).zip / science library(b) / INFO / PCCDEMO.ZIP / COMP1.EXE / FAULT.PRS < prev next >

Wrap

Text File | 1993-12-20 | 10KB | 155 lines

àÇöïô ôÄïäæÇìô ÆÿÆôäîÆ ╧╧╬╠╬╠╬╠╬╠╡ σΣα≤⌠±Σ α±≤ΦΓδΣ ┌──┐ │ │ │ │ ┼─┌─┐┌ ┌ │├ ─├─ ┌─┐ │┌─┐┌─┐┌─┐┌─┐┌─┐┌─┐ │ ┌─┤│ │ ││ │ │ │ │├─┘│ ┌─┤│ ││ ├─┘ ┴ └─┘└─┘ ┴└─┘ └─┘└─┘ ┴└─┘┴ └─┘└ └└─┘└─┘ ü√ Çπα∞ âεφφΦ≥εφ Before any discussion of Fault │ with a baseline premise. Tolerant systems can begin we │ Fault Tolerant systems must must define the term. │ tolerate faults. Unfortunately ╨╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╥╙ This may seem there seem to ╢ σα⌠δ≤-≤εδΣ±αφΓΣ Φ≥ α⌡αΦδαßδΣ █ self evident but be many ╢ εφ ≤τΣ πΣ≥Ω≤ε∩ ìÄû, ß⌠≤ █ we need to examine varied inter- ╢ α≤ α ∩±ΦΓΣ. █ what this really pretations on ╤▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄█ means. My defini- what fault tolerant actually │ tion of fault tolerant is means, so let's at least start │ somewhat more encompassing: Ç σα⌠δ≤ ≤εδΣ±αφ≤ ≥√≥≤Σ∞ ∞⌠≥≤ │ ∞ΦφΦΓε∞∩⌠≤Σ± α±Σα≥. In these ßΣ Γα∩αßδΣ εσ Γεφ≤Φφ⌠Φφµ │ areas price was a lesser concern ε∩Σ±α≤Φεφ Σ⌡Σφ Φσ α ∞αΘε± ∩ε±≤Φεφ │ than performance or reliability. εσ ≤τΣ ≥√≥≤Σ∞ ßΣΓε∞Σ≥ Φφε∩Σ±αßδΣ │ As such many diverse systems were ε± ≥⌠σσΣ±≥ πα∞αµΣ. │ implemented to handle the │ inevitable component breakdown Note that I haven't said that │ and be able to recover or to the system should be immune to │ carry on regardless. With the faults, but must be capable of │ increasing insistance on price continuing near-normal operation │ performance and reliability in in the face of faults which would │ office systems, some of the cripple a lesser system. To that │ mainframe techniques have found end a number of vendors have │ favour in a modified form for the spent a lot of time and money │ PC architecture machines. developing systems which try to │ meet these goals. │ éτΦΣσδ√ α∞εφµ≥≤ ≤τΣ≥Σ │ ≤ΣΓτφΦ≡⌠Σ≥ Φ≥ ≤τα≤ εσ πΦ≥Ω │ ±Σπ⌠φπαφΓ√. A system of redundant îαφ√ εσ ≤τΣ σα⌠δ≤ ≤εδΣ±αφ≤ │ disks called æÇêâ (Redundant ≥√≥≤Σ∞≥ ≤τα≤ ÷Σ ≥ΣΣ ≤επα√ τα⌡Σ │ Array of Inexpensive Disks) is ≤τΣΦ± ±εε≤≥ Φφ ≤τΣ ∞αΦφσ±α∞Σ αφπ │ leading the way in cost-effective data redundancy. æÇêâ was │ implementation as the disk originally defined as a system of │ capacity needed is double that 5 levels of redundancy by │ normally required. Patterson, Gibson and Katz in │ 1987. The æÇêâ Implementation │ æÇêâ £ ≥≤ε±Σ≥ äéé (ä±±ε± Level defines the type of system │ éε±±ΣΓ≤Φφµ éεπΣ) εφ ±Σπ⌠φπαφ≤ and covers a range of techniques. │ π±Φ⌡Σ≥. As most drives store ECC │ data at the end of each sector æÇêâ ïΣ⌡Σδ ¢ Φ≥ ßα≥ΦΓ πΦ≥Ω │ this level is hardly ever used. ∞Φ±±ε±Φφµ. By this we mean that │ æÇêâ ¥ Φ≥ α τΦµτ-∩Σ±σε±∞αφΓΣ disks are arranged in pairs with │ ≥√≥≤Σ∞ ≤τα≤ ⌠≥Σ≥ ∞⌠δ≤Φ∩δΣ π±Φ⌡Σ≥ data being written to both disks │ αφπ "≥≤±Φ∩Σ≥" ≤τΣ Φφσε±∞α≤Φεφ σε± at the same time. When data is │ ΣαΓτ ßδεΓΩ αΓ±ε≥≥ ∞⌠δ≤Φ∩δΣ read from a disk and a disk error │ π±Φ⌡Σ≥, αφπ ⌠≥Σ≥ α ∩α±Φ≤√ ßδεΓΩ results, the disk is reported as │ εφ α ±Σπ⌠φπαφ≤ π±Φ⌡Σ. The block having an error and the data is │ is then read from all drives taken from the "good" disk. This │ simultaneously and the parity system requires 2 drives of equal │ checked against the parity capacity to give you a single │ stripe. This system offers effective drive. Obviously cost │ increased cost-effectiveness over becomes a major factor in this │ RAID 1 in that only one extra drive is required in a system. │ ≥ΦφµδΣ π±Φ⌡Σ. There is still a This means that the more drives │ separate parity drive so that you use, the less the extra cost │ overlapped reads cannot be involved. Most RAID 3 systems │ achieved as each read requires a use between 3 and 8 drives │ read of the parity drive. This requiring between 50% and 15% │ limitation limits the use of RAID extra cost for the parity drive. │ and it is seldom used. In order to achieve good │ throughput the drive spindles │ æÇêâ ƒ ΣδΦ∞Φφα≤Σ≥ ≤τΣ ∩±εßδΣ∞≥ should be synchronised, otherwise │ ÷Φ≤τ æÇêâ ₧ Φφ ≤τα≤ ∩α±Φ≤√ a block read may require waiting │ Φφσε±∞α≤Φεφ Φ≥ ≥≤ε±Σπ Φφ α ±ε⌠φπ- for each drive in turn spinning │ ±εßΦφ σα≥τΦεφ α∞εφµ≥≤ αδδ π±Φ⌡Σ≥ to the required point. As most │ Φφ ≤τΣ α±±α√. It has the drives do not allow motor │ advantages of RAID 3 in that the synchronising, this system is │ extra cost is limited, but relatively uncommon. │ suffers in performance when │ compared with RAID 1. Still RAID æÇêâ ₧ Φ≥ ≥Φ∞Φδα± ≤ε æÇêâ ¥ ß⌠≤ │ 5 is probably the most cost- ±Σ∞ε⌡Σ≥ ≤τΣ ≥√φΓτ±εφΦ≥α≤Φεφ │ effective system in redundant ∩±εßδΣ∞≥ ß√ ±Σ≡⌠Φ±Φφµ α πΦ≥Ω │ disk technology. ßδεΓΩ ≤ε ßΣ ±Σ≥≤±ΦΓ≤Σπ ≤ε α │ Does this mean that fault │ (Uninterruptable Power Supplies) tolerance is just RAID? Well not │ but more a sophisticated method really. RAID is simply a disk │ of ensuring a clean transition management technology. If your │ from the powered-on to the power supply disappears you can │ powered-off state and back. They forget about RAID doing thing one │ are really just glorified short- about the situation. Fault │ term battery backups with tolerance must also address │ intelligence to let the computer issues of power supply, CPU │ know that it has a short time to redundancy, etc..etc. So let's │ close all its files and do an have a look at what's available │ orderly shutdown before power there. │ will be removed. When the power │ comes back on the system will Åε÷Σ± ≥⌠∩∩δ√. │ wait until it has recharged its │ batteries enough for another In the main the best form of │ power-down before starting the defence in this area is to use a │ computer up again. UPS. Most UPS suppliers now │ provide low cost "Intelligent │ This obviously is not ideal in UPS" units. These are │ a "non-stop" environment but for technically not UPS │ most of us it does what is really needed, and that is the │ file system either as standard or guaranteed orderly shutdown and │ optional features. This file restart. As more and more │ system is very much like that systems start using virtual │ used in the mainframe arena. It memory, more and more file │ uses a transaction logging information is transient and │ mechanism that allows the file subject to problems at power- │ system to be checkpointed so that down. This is the next point we │ information that is waiting to be can look at. │ written to disk when the power │ goes down is actually logged as àΦδΣ Æ√≥≤Σ∞≥. │ transactions to be completed. │ When the system returns to power There is a lot of noise being │ the file system is returned to made about the "new" technologies │ its last checkpoint and the of NT versus UnixWare and how │ pending transactions are these (and other) systems are │ completed. improving the lot of multi-user, │ high-power workstation users. It │ This is not the only "safe" just turns out that both of these │ filesystem technology around but offerings (and SCO for that │ must have something going for it. matter) all offer the Veritas │ This does not preclude the use of a UPS but does make for a much │ Ä≤τΣ± Φ≥≥⌠Σ≥. safer system. │ │ Obviously there are other éÅö. │ issues that contribute to fault │ tolerance. Some of these are The only other major point │ security related and so should that can fail is the CPU. This │ properly be discussed elsewhere, is being addressed by a number of │ whilst others relate to such vendors supplying symmetrical │ things as network access to mulit-processing systems. In │ multiple machines for users. these all CPU's are capable of │ This really comes under the title running all jobs (i.e. there is │ of network management and needs no master/slave relationship) and │ another few pages to discuss. so should any CPU become │ unavailable it's jobs can be │ Basically what we have seen is evenly distributed across the │ that fault-tolerance is available remaining processors. │ on the desktop NOW, but at a Obviously this is an expensive │ price. The level of fault alternative, but depending on │ tolerance you want is controlled your environment, may well be │ mainly by your wallet and your worth the cost. │ environment ñ